This was motivated by the post: https://medium.com/@nayafia/what-success-really-looks-like-in-open-source-2dd1facaf91c#.xbtww37yy
This particular notebook examines Python-based packages, to determine how significant a package is within that ecosystem. It assumes that you have the relevant packages checked out into a particular directory.
In [6]:
# Installed Package Imports
import os
import matplotlib
%matplotlib inline
from matplotlib import pylab
from collections import Counter
import pandas
# Custom Code Imports
import sys
sys.path.append('../')
import utils
In [24]:
PACKAGES_DIR = '../../source_packages'
packages = os.listdir(PACKAGES_DIR) # Uncomment to look at what's being examined
print(', '.join(packages))
In [12]:
# Here we walk the directory, and build a Pandas dataframe of results
relevant_files = utils.yield_relevant(PACKAGES_DIR)
package_generator = utils.yield_all_packages(relevant_files)
df = utils.package_stats(package_generator)
In [29]:
# There is some spurious content causing very large import counts for some terms
# TODO: Genuinely debug this
# FORNOW: Ignore massive counts
df_small = df[df['count'] < 40]
print(df_small.describe())
print()
print(df.head())
In [21]:
plot = df_small['count'].hist()
plot.set_xlabel("Number of times imported")
plot.set_ylabel("Number of packages in bin")
Out[21]:
In [ ]: